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Abstract. One of the ways to describe objects in images is to identify some of their characteristic points or points 
of attention. Areas surrounding attention points are described by descriptors (a set of features) in such a way that they can 
be identified and compared. On these features the search for identical points on other images is carried out by scanning 
them with a sliding window. The most famous descriptors and methods for finding identical points are: SIFT, SURF, 
GLOH, BRIEF and others. This group of methods is characterized by the fact that the displacement of identical points in 
video images can be arbitrary, but the accuracy of calculating their coordinates depends on the bit grid of video images 
and, in the best case, is equal to the interpixel distance. 

Another group of methods that can be used to track identical points of video images are methods built on the 
basis of optical flow calculation. One of the popular methods of tracking points based on optical flow calculation is the 
Lucas-Kanade method. It allows you to calculate the displacement of points in the interpixel space due to the solution of 
differential equations. To date, the Lucas-Kanade method has several modifications. A limitation of these methods is 
that the neighborhoods of the shifted points must overlap to a large extent. 

The article investigates and proposes the complex application of methods of scanning video images with a 
sliding window and differential calculation of optical flow, which allows to increase the accuracy and speed of 
calculating the coordinates of identical points in the images in relation to the search for these points only by scanning. A 
more accurate calculation of the coordinates of the characteristic points of the object in the interpixel space of video 
images will lead to a more accurate determination of the position and orientation of these objects in 3D space. 

The simulation was carried out using the method of rough search for identical points of video images described 
by invariant moments and specifying their coordinates using the Lucas-Kanade point tracking method. The simulation 
results indicate an increase in speed by almost an order of magnitude and, according to indirect estimates, the accuracy 
of calculations. 


Keywords: images, characteristic points, identical regions, invariant moments, optical flow. 


Introduction neighborhood. Especially in the process of 

One way to describe image objects is to geometric deformations, this significantly 
identify some of their characteristic points or complicates the process due to the fact that 
points of interest. Due to such points, the size and shape must be determined in an 
comparison of objects is performed, and invariant way. 
detectors are used to search and detect The regions surrounding the attention 
attention points. If the image can be points are described by descriptors in such a 
represented by a set of such points of way that they can be identified and compared. 
attention, then, of course, it is possible to As a result of the construction of descriptors, 
significantly reduce both the redundancy of a set of feature vectors is formed for the initial 
information required for processing and the set of characteristic points of one of the 
time of image search. images. Based on these features, the search 

To localize points of interest in an for identical points on other images is carried 
image, it is necessary to analyze the local out by scanning them with a sliding window 
neighborhood of pixels, giving all local and calculating the characteristics of the area 
features some spatial extent. In this case, the in the window. 
term "area" is used instead of "point of SIFT (Scale Invariant — Feature 
attention". The process of identifying features Transform) [1] — one of the most famous 
of the surrounding area should determine not descriptors, which is also a detector. It is 
only the position of the attention point, but based on the idea of calculating the histogram 
also the size and possible shape of the local of oriented gradients in the neighborhood of a 
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special point. The PCA-SIFT descriptor [2] is 
essentially a modification of SIFT and is built 
according to the same scheme, only for each 
special point a larger environment is 
considered. For the resulting set of 
descriptors, the dimensionality of the vectors 
is reduced using principal component analysis 
(Principal Component Analysis, PCA). 

The SURF descriptor (Speeded up 
Robust Features) [3] also belongs to those 
descriptors that simultaneously search for 
singular points and create their description, 
invariant to scaling and _ rotation type 
transformations. Additionally, the keypoint 
search itself is invariant in the sense that the 
returned scene object has the same set of 
keypoints as the sample. 

The GLOH descriptor (Gradient 
Location-Orientation Histogram) [4], which 
was built to increase reliability, is a 
modification of the Sift descriptor. In fact, the 
Sift-descriptor is calculated, but the polar grid 
of the neighborhood breakdown is used. 

When developing the DAISY descriptor 
[5], the ideas of building SIFT and Gloh 
descriptors were used. Similarly, GLOH 
selects a circular neighborhood of a special 
point, while individual blocks are not 
represented by partial sectors, but by circles. 

The goal of creating the descriptor 
BRIEF (Binary Robust Independent 
Elementary Features) [6] was to ensure 
recognition of the same areas of the image 
that were obtained from different viewing 
angles. At the same time, the task was to 
reduce the number of performed calculations. 
The recognition algorithm is reduced to the 
construction of a random forest (randomize 
classification trees) or a naive Bayesian 
classifier on some training set of images and 
the subsequent classification of test image 
sections. A small number of operations is 
provided by representing the feature vector as 
a binary string and using the Hamming 
distance as a measure of similarity. A more 
efficient alternative to BRIEF and SIFT 
descriptors is the ORB binary descriptor [7]. 

The work [8] presents a method in 
which the regions surrounding points are 
described using invariant moments. 

This group of methods is characterized 
by the fact that the displacement of identical 
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points in video images can be arbitrary, but 
the accuracy of calculating their coordinates 
depends on the bit grid of video images and, 
in the best case, is equal to the interpixel 
distance. 

Another group of methods that can be 
used to track (search) identical points of video 
images are methods built on the basis of 
optical flow calculation. 

There are several definitions of optical 
flow. One of them [9]: the vector field of 
apparent movement of objects (pixels), 
surfaces and edges in a visual scene between 
frames, caused by the relative movement 
between the observer (eye, camera) and the 
scene. 

Optical flow is based on the statement 
that for each point of the original image 
I1(x,y), where x and y are the coordinates of 
the point, one can find such a shift (6x,dy) 
that the initial point corresponds to the point 
12(x+6x, y+dy) in the second image. In order 
to determine the correspondence between 
them, a certain function of the point is used, 
which does not change as a result of the shift 
[10]. Brightness, gradient, Hessian, Laplacian 
and others can be used for this. 

One of the popular methods of tracking 
video image points based on optical flow 
calculation is the Lucas-Kanade method, 
which allows you’ to calculate the 
displacement of points in the interpixel space 
by solving differential equations. Today, the 


Lucas-Kanade method has several 
modifications [11]. 
In the $Tomasi-Kanade method, 


movement is considered displacement and is 
calculated by iteratively solving — the 
constructed system of linear equations. 

The Shi-Tomasi—Kanade method takes 
into account affine distortions. 

The Jean-Favara-Soatto method takes 
into account changes in illumination. 

The limitation of these methods is that 
the shifted points must be at such a distance 
that their surroundings taken for analysis 
overlap to a large extent (more than 50%). 

The purpose of this work is to 
investigate the possibilities of finding 
identical points (regions) of video images due 
to the complex application of the method of 
scanning images with a sliding window and 
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the method of tracking points based on the 
calculation of optical flow. Thanks to this 
combination of methods, it is expected that 
good indicators of the accuracy of finding 
identical points and the speed of calculating 
their coordinates will be obtained in relation 
to finding these points only by scanning 
images. 

For research, the authors used the 
method describing the surrounding of points 
with invariant moments and the classic Lucas- 
Kanade method proposed in 1981. 


Algorithm for finding identical points 
using invariant moments 

The input data for the algorithm [8] are 
2 frames obtained under conditions of 
displacement of image objects. Algorithm 
steps for one point. 


Stage 1. 

Before carrying out the following 
procedures, the color images of the frames are 
converted to gray (halftone) and smoothed 
with a Gaussian filter. 


Stage 2. 

On the first frame, a point is set and the 
characteristics of its surroundings are 
calculated with the size of pixels, where 
w_size varies from 7 to 15. The central 
window is chosen for which w_size=10, that 
is, a window with a size of 21x21 pixels, 
which corresponds to a change in scale by one 
and a half times when reducing or increasing 
W_Size. 

For the surroundings of the point taken 
as a standard, the moments f0_et, fl_et, f2_et 
and the average brightness of the pixels 
surrounding the given point h_et and its four 
fragments (Fig. 1) h0O_et, hOl_et, h10_et, 
h11_et in size are calculated. 

First, the geometric moments of the 
circles of points with the center of coordinates 
in the upper left corner of the image are 
calculated. 


pes X+w_ size 


Myx y 


i=x-w_size j=y—w_size "J 


yee { 


qe: 

k — order of geometric moments; 

x, y— coordinates of a given point; 

M ae — geometric moments of the k-th 


order, respectively further M°, M°!, M’°, M”, 
M??, M" for k =2; 
pij — pixel brightness with coordinates i, /. 


Fig. 1. The environment of a given point, 
divided into fragments 


Then transformations are carried out, 
which correspond to normalization with 
respect to displacement, scaling and rotation. 

Transformations of moments with 
respect to shear are defined as follows: 

Hyg =M, 

tig, = M0! —Ay-MOl, 

yg =< M10—ax-o!0, 

Hy = M9? ~2. Ay-M9l + Ay? 0, 

Hog = M792. Ax-M10 4 Ax?.M%, 

py =M!!—ay-M10_av-Ml 4 Av-Ay-M%, 
ye Ax, Ay — the distance from the upper left 
comer of the image to the specified point (x, 


y), which is taken as the center of coordinates. 
Normalization with respect to scale: 


‘ FLAT 
Lk ROA * 
HOO 
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The invariant Hu moments for the 
image region are normalized with respect to 
translation, scaling, and rotation: 


yD 2D 
SQ = 761+ 70> 
Ff, =29 + 92° 
fy = (99-1192)? + 47 


Tyt fg — invariant moment provided 


that (x, y) is not the center of gravity of the 
region. 

The average brightness of the 
surrounding pixels and their fragments is 
calculated as M® surroundings or _ its 
fragments, divided respectively by the 
number of pixels in these areas, such as 
surroundings 


h_o, = MY 1(2-w_size +1)-(2-w_size +1)) 
or a fragment with index 00 (see Fig. 1) 
hog op = MOO op M(w_sizet+1)-(w_size+)). 


Stage 3. 
For the second frame, the integral 
representation [8] is calculated for each of the 


moments of the k-th order 
Lk-l » [2k] 

My => DY pytcs , 1=(0,..,k), 
oe 0 j=0 4 

me: 


x=(w_size,..., image width - w_size -1); 
y=(w_size,..., image height - w_size -1). 

An integral representation is a matrix of 
the same size as the image size. For k=2 there 
will be 6 such views, let's mark them FM”, 


Lk=l 
FM*', FM", FM”, FM, FM". MY" — 


matrix elements that are equal to the sum of 
the geometric moments of individual image 
pixels located to the left and above the 
coordinate (x, y) with the center of the 
coordinate (0, 0) in the upper left corner. 

The integral image is _ sequentially 
scanned by the central window (w_size=10). 
For each point, geometric moments of all 
orders are calculated from integral 
representations [8], as well as, as in the 
second stage, all invariant characteristics of 
the region — fo_ob, fi_ob, fe_ob, h_ob, 
hoo_ob, ho1_ob, hio_ob, hii_ob. 

The relative errors of the characteristics 
are calculated delta_, for example for fo: 
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delta_ fo = 1 - min(fo_et, fo_ob)/ max(fo_et, 
fo_ob), as well as integral squared error — 
delta= delta_ fo’ + delta_fi°+ delta_ fy’, 

and the verification is carried out under one of 
the conditions of comparison of errors with 
thresholds in different combinations [8]. If the 
condition is met, the following parameters are 
saved: 


min_delta=delta; 
result_x=x; 
result_y=y; 
result_w_size= w_size. 


After scanning the entire image and 
comparing it with the characteristics of the 
standards for all w_size=7,...,15, we get the 
coordinates of the identical point on the 
second frame, which corresponds to the 
specified point on the first frame (result_x, 
result_y), as well as the resulting window size 
(result_w_size), which indicates a change in 
scale. 


Lucas-Kanade algorithm 
The Lucas-Kanade method is used quite 
widely in the tasks of estimating the motion 
of an object [12]. It refers to local methods of 
optical flow calculation, as it processes pixels 
in the vicinity of a certain point. 
This method assumes that: 
a) the shift of points on the current and 
previous images is insignificant, 
b) the displacement of points in the 
neighborhood of a certain point is the same, 
c) pixel intensity values do not change over 
time: 


I(x, y,t) —I(x + &, y + dy,t+ ot) =0, 


where I (x, y, t) — pixel intensity function with 
coordinates (x, y) in the frame ¢ and (6x, dy ) 
— pixel displacement between successive 
frames ¢ and t+ oF. 

Suppose that D = \isaseisagy | — a set 


of points around a point P. 

Taking into account the = small 
displacement during the linear expansion of 
the function for each of the points in the 
Taylor series, we obtain a system of equations 
that is solved by the method of weighted least 
squares. A function is used to determine the 
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weighting factors for pixels in the image 
W(x, y). According to this method, to find a 


solution, it is necessary to minimize the error: 


e(v) = YW(x,y) [U(x y,t) — Lr + de, y +0), + OP 
X, YE 


2 

ol ol ol 

éeVH= > w(s.r){ So +—vD +) ; 
x,yeD (sy) ax By? Gt 


me D=|bx,vy) — shear speed according to the 


corresponding coordinates. 
To find the minimum error, it is 


necessary to equalize it to zero da(v) | 
Ovx 

0&é(v) 

Ovy 


As a result, we get an equal-nannies 
system: 


2 
ol ol al ol al 
WwW > | itt : + . . + . =0 
eeeD (sy) (2) Ox Ox Oy oy ox ar 


al al (2 } 
. (8) teem . 


These equations can be presented in 
matrix form A-v+B=0, 


where 
2 
¥ way 2) y won y}( 2S 
As! x,yeD xX} x,yeD oy e 
ol ol ol 
W(x, y)-} —— .y)| = 
ay ce y) (z an (x y) (2) 
ol ol 
off zo" Fa) 
vy |’ WwW : oo 
wD (x,y) (F A 


Accordingly v= ~ATl.B. 

If we take the interval between the 
reception of the previous and current frames 
as a unit of time, then \vx,vy} — displacement 


of a _ point along the corresponding 
coordinates, which is not an integer. That is, 
the coordinates are calculated in the interpixel 
space. The calculation of the final coordinates 
of the point for the current frame is carried 
out by an iterative procedure with the 
determination of intermediate displacements 
of the point and its coordinates. Values of 
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pixel intensities surrounding the initial, final 
and intermediate points are interpolated. The 
iterative procedure ends when the maximum 
number of iterations or the amount of 


displacement is reached (vr +u5} less than 


the specified accuracy. 
This algorithm is simple and fast, so it 
is quite effective in many cases. 


Simulation results 

As mentioned earlier, the research was 
carried out using the scanning method [8] for 
a rough search for identical points described 
by invariant moments and specifying their 
coordinates using the Lucas-Kanade (L-K) 
method. 

Two separate frames with a size of 
640x360 pixels from the video sequence [13] 
were taken for research. Points were set on 
Frame 1, and their search was carried out on 
Frame 2. 

In this experiment, a condition was used 
to coarsely search for points in frame Ne 2 [8]: 


if ( delta_ fi<min_delta 
&& delta_fo<0.3 

&& delta_f2<0.3 

&& delta_h<0.01 

&& delta_h_00<0.05 
&& delta_h_01<0.05 
&& delta_h_10<0.05 
&& delta_h_11<0.05 ). 


All pixels of the video image taken as 
the center of the sliding window with the size 
of 21x21 pixels were scanned and _ the 
parameter was minimized fi. 

In fig. 2 presents video images with the 
numbers of pairs of adjacent points specified in 
frame 1 and found by the scanning method in 
frame 2. 


i) KADRI = ia} x 


OBOPOHA YKPAIHM 


~*~. 
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Fig. 2. Video images with points set on frame | and 
found by scanning on the frame 2. 


To specify the coordinates of the points, 
the L-K algorithm was applied with some 
changes and additions, namely: 

1) The surroundings of the points on Frame 
1 were taken as standards, the refinement 
began with the coordinates of the points 
obtained as a result of the preliminary scanning 
of Frame 2. 

2) The condition for the end of the iterative 
process for each point separately has been 
strengthened. 

— if (previous offset of the point <0,01) 
&& (current offset of the point <0,01) && 
(the vector sum of the previous and current 
displacements of the point <0,01) {K3=1}; 

— if (the vector sum of the previous and 
current displacements of the point <0,01) 
{K3=2}; 

— if (the specified maximum number of 
iterations is reached) {K3=3}. 

1) The second check means the beginning 
of the oscillating process, the short circuit is 
the code for the end of the iterative process. 

2) 1) An error check was performed at each 
iteration. 


Y W(x, y)-LG, y,t) - [at & y+ 50+ a) 
Xe 


Eps(v) = 


The result was taken as the coordinates 
of the point obtained on the iteration in which 
min (errors) was reached. 

The results of this experiment are 
presented in Table 1. Short circuit = 1 was 
obtained for all points. 
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Table 1. Results of the experiment of scanning frame 2 
without thinning and refinement of point coordinates 
by the L-K algorithm with the condition Ne 2. 


= cy pains x/y points | x/y points Frame2 Saeed 

8 Pomel Frame2 specification of L-K NG ain 
3 Scan. min (Eps) (Eps) 
1.1 | 196/62 92/61 92,105/61,330 4/4 
1.2 | 197/61 94/60 93,112/60,315 5/5 
2.1 | 219/107 | 114/106 | 114,828/106,495 5/5 
2.2 | 220/106 | 116/105 | 115,915/105,505 5/2 
3.1 | 446/186 | 342/185 | 342,146/184,687 5/5 
3.2 | 447/185 | 343/184 | 343,146/183,638 4/4 
4.1 | 475/92 371/90 370,872/90,545 5/5 
4.2 | 474/91 370/90 369,868/89,540 5/2 
5.1 | 491/31 386/29 386,659/29, 150 4/4 
5.2 | 492/30 387/28 387,658/28, 160 4/4 
6.1 | 609/132 | 506/130 | 504,841/130,278 6/6 
6.2 | 608/131 | 504/132 | 503,877/129,262 6/6 


A comparison of the difference in the 
coordinates of a pair of neighboring points is 
taken as an indirect indication of the achieved 
accuracy of the coordinate calculation (for 
example x72 - x77, and y7.2 - yr ) in frame 1 
and frame 2. In order to reduce the influence 
of the change in the angle of shooting frames, 
pairs of adjacent points with a coordinate 
difference on Frame | equal to 1 were taken. 
The deviation of the coordinate difference of 
adjacent points from unity obtained as a result 
of the search, in our opinion, may indicate the 
achieved accuracy of their calculation . The 
result of the comparisons is presented in 
Table 2. 


Table 2. Results of calculations of the difference 
in the coordinates of neighboring points when scanning 
without thinning. 


Ne X2-Xi/ y2-X1 X2-Xi/ y2-X1 cad y ae P ee 2 
pairs ee ne OP nel points Frame 
pairs of points | pairs of points ‘fication L-K 
of Framel Frame2 Scan. a ia : 
points min Eps 
1 1/-1 2/-1 1,007/-1,015 
2 1/-1 2/-1 1,087/0,99 
3 1/-1 1/-1 1,0/-1,049 
4 -1/-1 -1/0 -1,004/-1,005 
5 1/-1 1/-1 0,999/-0,99 
6 -1/-1 -2/2 -0,964/-1,016 


As can be seen from Table 2, the 
difference between the coordinates of the 
points of Frame 1 and Frame 2 after 
refinement by the L-K algorithm is less than 
0.1 part of the interpixel distance. 
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Acceleration of the search for 
identical points 

If as a result of scanning you need to 
approximate the coordinates of identical 
points, you can move the sliding window by 
several pixels to speed it up. Table 3 presents 
the results of the experiment for the case of 
scanning frame 2 with horizontal and vertical 
cutting through 3 pixels. At the same time, the 
conditions for comparing the characteristics 
of the standard and the characteristics 
calculated in the sliding window’ were 
relaxed, namely: 


if ( delta_fi<min_delta 
&& delta_fo<0.4 

&& delta_f2<0.4 

&& delta_h<0.015 

&& delta_h_00<0.06 
&& delta_h_01<0.06 
&& delta_h_10<0.06 
&& delta_h_11<0.06 ). 


Table 4 presents the results of 
calculating the difference in the coordinates of 
neighboring points when scanning with 
thinning by 3 pixels. From the data in this 
table, it can be seen that the difference 
between the coordinates of the points of 
Frame | and Frame 2 after refinement by the 
L-K algorithm is also less than 0.1 part of the 
inter-pixel distance. Taking into account the 
insignificant time used by the L-K algorithm, 
the calculation is accelerated by almost an 
order of magnitude compared to the algorithm 
with a check for the identity of all points of 
Frame 2. 


Conclusions 

As a result of the conducted research, a 
complex application of the methods of 
scanning video images with a sliding window 
and differential calculation of optical flow is 
proposed, which allows to increase the 
accuracy and speed of calculating the 
coordinates of identical points in the images 
in relation to the search for these points only 
by scanning. 


Table 3. Results of the experiment of 
scanning frame 2 with thinning and refinement of 
point coordinates by the L-K algorithm with 
a softened condition. 


Ne x/y x/y __|x/y points Frame2 | Quantity 
points] points | points | specification L-K | Eps/ Ne 
Framel | Frame2 min (Eps) min 
Scan. (Eps) 


1.1 | 196/62 | 91/61 92,105/61,330 5/3 


1.2 | 197/61 | 94/61 93,112/60,315 5/5 


2.1 | 219/107 | 115/106 | 114,828/106,495 5/2 


2.2 | 220/106 | 115/106 | 115,915/105,505 5/5 


3.1 [446/186 | 343/184 | 342,146/184,687 5/4 


3.2 [447/185 | 343/184 | 343,146/183,638 4/4 


4.1 | 475/92 | 373/91 | 370,874/90,545 7/4 


4.2 | 474/91 | 370/91 | 369,866/89,539 6/3 


5.1 | 491/31 | 385/28 | 386,659/29,150 5/5 


5.2 | 492/30 | 388/28 | 387,658/28,160 4/2 


6.1 | 609/132 | 505/130 | 504,844/130,277 5/2 


6.2 | 608/131 | 502/130 | 503,877/129,262 6/6 


Table 4. Results of calculations of the difference 
in the coordinates of neighboring points when 
scanning with thinning by 3 pixels. 


No | X2-Xi/ yo-xXy | X2-xi/ yo-x1 | X2-Xi/ y2-X) pairs 
pairs pairs of pairs of | of points Frame2 

of points points specification L-K 
points Framel Frame2 min Eps 

Scan. 

1 1/-1 3/0 1,007/-1,015 

2 1/-1 0/0 1,087/0,99 

3 1/-1 0/0 1,0/-1,049 

4 -1/-1 -3/0 -1,008/- 1,006 

> 1/-1 3/0 0,999/-0,99 

6 -1/-1 -3/0 -0,967/-1,015 


A more accurate calculation of the 
coordinates of the characteristic points of the 
object in the interpixel space of video images 
will lead to a more accurate determination of 
the position and orientation of these objects in 
3D space. 

Modeling, using the example of using 
the method of rough search for identical 
points described by invariant moments and 
specifying their coordinates using the Lucas- 
Kanade point tracking method, confirmed the 
conclusions about an increase in speed by 
almost an order of magnitude and, according 
to indirect estimates, the accuracy of 
calculations. 

Research should be continued in the 
direction of choosing various combinations of 
scanning methods and_ optical flow 
calculation, the complex application of which 
will allow searching for identical points 
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regardless of arbitrary rotation and scale 
changes of video image objects and will 


increase the speed and accuracy of 
calculations. 
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